Main
David Zhang
Bioinformatics software engineer with experience operating across the entire software development lifecycle. Skilled in prototyping and benchmarking innovative solutions, as well as implementing, testing, and integrating software into production-ready pipelines.
Work Experience
Senior bioinformatics engineer
London, UK (hybrid)
Present - 2024
- Scaled machine learning packages to derive insights from single-cell perturb-seq data containing million of cells. Directed the project and to pull togther insights to determine company direction.
- Created a data pipeline to injest, version-control and deploy a neo4j knowledge graph. Automated the deployment and release of the graph to AWS via the CI using terraform.
Senior bioinformatics software engineer
Hinxton, UK (hybrid)
2024 - 2022
- Developing, benchmarking and productionising bioinformatic pipelines in nextflow for the precision oncology product.
- Engineering nextflow and snakemake pipelines that perform alignment, variant calling, driver annotation and therapy matching using solid tumour sequencing data.
Bioinformatician internship (2 months)
London, UK (remote)
2021
- Created a reproducible aberrant splicing detection pipeline using docker for drug target discovery in C9orf72 ALS patients.
Education
PhD, Bioinformatics
University College London
London, UK
2022 - 2017
- Thesis: Using transcriptomics to improve the genetic diagnosis rate of rare disease patients.
- Developed and released software that facilitate transcriptomic analyses with a focus on diagnostics.
MSc, Neuroscience
University College London
London, UK
2016 - 2015
- Thesis: The role of mitochondrial dysfunction in Xerodoma pigmentosum
- Grade: Merit (68%)
- Awarded post-graduate support scheme bursary (£10,000)
BSc, Biomedical science
University College London
London, UK
2015 - 2012
- Thesis: Investigating the function of CYFIP1 in the development of rat hippocampal neurons.
- Grade: 2:1 (69%)
H.S.
Queen Elizabeth’s School
Barnet, UK
2012 - 2007
- Grades: Maths (A*), Biology (A*), Chemistry (A*), Sociology (A).
Software & programming
Portfolio website
N/A
N/A
Present - 2022
- My website is built using Django/Bootstrap 5, deployed with Heroku and showcases the five projects I’m most fond of.
Python packages
N/A
N/A
2023 - 2021
- codino converts a codon design to the expected amino acid frequencies, and vice versa. Author.
- autogroceries: Use Selenium to automate your grocery shop. Author.
- stravaboard: A dashboard for flexibly displaying and tracking Strava runs built using Streamlit. Author.
R packages
N/A
N/A
2022 - 2020
- ggtranscript: Visualising transcript structure and annotation using ggplot2. Author.
- megadepth: BigWig and BAM related utilities. An R wrapper for the megadepth software developed by Chris Wilks. Co-author.
- dasper: Detection of aberrant splicing events in RNA-sequencing. Author,
Selected Publications
A complete list of publications is available via Google Scholar
ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2
Bioinformatics
N/A
2022
- Gustavsson EK, Zhang D, Reynolds RH, Garcia-Ruiz S, Ryten M
- Role: Co-first author
- DOI: https://doi.org/10.1056/NEJMoa1915722
Developmental Consequences of Defective ATG7-Mediated Autophagy in Humans
The New England Journal of Medicine
N/A
2021
- Collier J, Guissart C, Oláhová M, Sasorith S, Piron-Prunier F, Suom Fi, Zhang D, Martinez-Lopez N, Leboucq N, Bahr A, Azzarello-Burri S, Reich S, Schöls L, Polvikoski TM, Meyer P, Larrieu L, Schaefer AM, Alsaif HS, Alyamani S, Zuchner S, Barbosa IA, Deshpande C, Pyle A, Rauch A, Synofzik M, Alkuraya FS, Rivier F, Ryten M, McFarland R, Delahodde A, McWilliams TG, Koenig M, and Taylor RW.
- Role: Co-first author
- DOI: https://doi.org/10.1093/bioinformatics/btac409
Megadepth: efficient coverage quantification for BigWigs and BAMs
Bioinformatics
N/A
2021
- Wilks C, Ahmed O, Baker DN, Zhang D, Collado-Torres L, Langmead B.
- Role: R package developer.
- DOI: https://doi.org/10.1093/bioinformatics/btab152
Incomplete annotation of disease-associated genes is limiting our understanding of Mendelian and complex neurogenetic disorders.
Science advances
N/A
2020
- Zhang D, Guelfi S, Ruiz SG, Costa B, Reynolds RH, D’Sa K, Liu W, Courtin T, Peterson A, Jaffe AE, Hardy J, Botia JA, Collado-Torres L and Ryten M.
- Role: First Author.
- DOI: https://doi.org/10.1126/sciadv.aay8299